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(57) Abstract 



A method and apparatus that peiforms blind 
source separation usign convolutive signal decor- 
relation. More specifically, the method accumu- 
lates a lengdi of input signal (mixed signal) that 
comprises a plurality of independent signals from 
independent signal sources. The invention then 
divides the length of input signal into a plural- 
ity of T-length periods (windows) and perfonns 
a discrete Fourier transfonn (DFT) on the signal 
whithin each T-lengtfi period. Thereafter, esti- 
mated cross-correlation values are computed us- 
ing a plurality of the average DFT values, A 
total number of K cross-correlation values are 
computed, where each of the K values is av- 
eraged over N of tiie T-lengtfi periods. Using 
the cross-coirelation values, a gradient descent 
process computes the coefficients of a FIR fil- 
ter that will effectively separate the source sig- 
nals witfiin the input signal. To achieve an accu- 
rate solution, the gradient descent process is con- 
strained in that the time-domain values of the fil- 
ter coefficents can attain only certain values, i.e., 
the time-domain filter coefficient values W(r) are 
constrained within each T-length period to be 
zero for any time t > Q. In this manner, a unique 
solution for the FIR filter coefficients is computed 
and a filter produced using these coefficents will 
effectively separate the source signals. 
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CONVOLUnVE BLIND SOURCE SEPARATION 
USING A MULTIPLE DECORRELATION METHOD 

This application claims the benefit of U.S. Provisional AppUcation No. 
5 60/081,101 filed April 8, 1998, which is herein incorporated by reference. 

The invention relates to signal processing and, more particularly, the 
invention relates to a method and apparatus for performing signal separation 
using a multiple decorrelation technique. 

10 BACKGROUND OF THE INVENTION 



A growing nxmiber of researchers have recently pubUshed techniques that 
perform blind source separation (BSS), i.e., separating a composite signal into 
its constituent component signals without a priori knowledge of those signals. 

15 These techniques find use in various appUcations such as speech detection using 
mvdtiple microphones, crosstalk removal in multichannel com m unications, 
multipath channel identification and equaUzation, direction of arrival (DOA) 
estimation in sensor arrays, improvement of beam forming microphones for 
audio and passive sonar, and discovery of independent soiirce signals in various 

20 biological signals, such as EEG, MEG and the like. Many of the BSS techniques 
require (or assume) a statistical dependence between the component signals to 
accurately separate the signals. Additional theoretical progress in signal 
modeUng has generated new techniques that address the problem of identifying 
statistically independent signals - a problem that lies at the heart of source 

26 separation. 

The basic source separation problem is simply described by assuming d, 
statistically independent sources s(f) = [5, (r),© ,5^^ (t)f that have been convolved 
and mixed in a linear medium leading to sensor signals x(t) = [x^{t),©,Xj^ it)f 

that may include additional sensor noise nit). The convolved, noisy signal is 
30 represented in the time domain by the following equation (known as a forward 
model): 



1 
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p 

x(0 = ^A(T)sit- r)+ii(0 (1) 

Soiirce separation techniques are used to identify the djdJP coefficients of the 
channels A and to ultimately determine an estimate s (t) for the imknown 
source signals. 

5 Alternatively, the convolved signal may be filtered using a finite impulse 

response (FIR) inverse model represented by the following eqtiation (known as 
the backward model): 

u(0 = ilW(T)x(f-r) (2) 

In this representation, a BSS technique must estimate the FIR inverse 
10 components W such that the model source signals u(r) = [m, (r),© {t)f are 

statistically independent. 

An approach to performing source separation under the statistical 
independence condition has been discussed in Weinstein et al., "Multi-Channel 
Signal Separation by Decorrelation", IEEE Transaction on Speech and Audio 

15 Processing, vol, 1, no. 4, pp. 405-413, 1993, where, for non-stationary signals, a 
set of second order conditions are specified that uniquely determine the 
parameters A in the forward model However, no specific algorithm for 
performing source separation based on non-stationarity is given in the 
Weinstein et al. paper. 

20 Early work in the signal processing community had suggested 

decorrelating the measured signals, i.e., diagonalizing measured correlations for 
multiple time delays. For an instantaneous mix, also referred to as the constant 
gain case, it has been shown that for non-white signal decorrelation using 
multiple filter taps is sufficient to recover the soxirce signals. However, for 

25 convolutive mixtures of wide-band signals, this technique does not produce a 
unique solution and, in fact, may generate source estimates that are 
decorrelated but not statistically independent. As clearly identified by 
Weinstein et al. in the paper cited above, additional conditions are required to 
achieve a unique solution of statistically independent sources. In order to find 

30 statistically independent source signals, it is necessary to capture more than 

2 
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second order statistics, since statistical independence requires that not only 
second but all higher cross moments vanish. 

In the convolutive case, Yellin and Weinstein in "Multichannel Signal 
Separation: Methods and Analysis", IEEE Transaction on Signal Processing; vol. 
5 44, no. 1, pp. 106-118, January 1996 estabUshed conditions on higher order 
multi-tap cross moments that allow convolutive cross talk removal. Although 
the optimization criteria extends naturally to higher dimensions, previous 
research has concentrated on a two dimensional case because a multi-channel 
FIR model (see equation 2) can be inverted with a properly chosen architecture 
10 using estimated forward filters. Heretofore, for higher dimensions, finding a 
stable approximation of the forward model has been illusive. 

These prior art techniques generally operate satisfactorily in computer 
simulations but perform poorly for real signals, e.g., audio signals. One could 
speculate that the signal densities of the real signals may not have the 
15 hypothesized structures, the higher order statistics may lead to estimation 
instabihties, or a violation of the signal stationarity condition may cause 
inaccurate solutions. 

Therefore, there is a need in the art for a bUnd soxirce separation 
technique that accurately performs convolutive signal decorrelation. 

20 

SUMMARY OF THE INVENTION 
The disadvantages of the prior art are overcome by a method and 
apparatus that performs blind sovu-ce separation using convolutive signal 
decorrelation by simultaneously diagonahzing second order statistics at multiple 
25 time periods. More specifically, the invention accxunulates a length (segment) of 
input signal that comprises a mixture of independent signal sources. The 
invention then divides the length of input signal into a plm-aUty of T-length 
periods (windows) and performs a discrete Fourier transform (DFT) on the 
mixed signal over each T-length period. Thereafter, the invention computes K 
30 cross-correlation power spectra that are each averaged over N of the T-length 
periods. Using the cross-correlation power values, a gradient descent process 
computes the coefficients of a FIR filter that will effectively separate the source 

3 



wo 99/52211 PCT/US99/07707 

signals within the input signal by simultaneovisly decorrelating the K cross- 
correlation power spectra. To achieve an accxzrate solution, the gradient descent 
process is constrained in that the time-domain values of the filter coefficients 
can attain only certain values, i.e., the time-domain filter coefficient values W(t) 
5 are constrained within the T-length period to be zero for any time t>Q«T, In 
this manner, the so-called "permutation problem** is solved and a unique 
solution for the FIR filter coefficients is computed such that a filter produced 
using these coefficients will effectively separate the source signals. 

Grenerally, the invention is implemented as a sofbware routine that is 

10 stored in a storage medium and executed on a general purpose computer 
system. However, a hardware implementation is readily apparent firom the 
following detailed description. 

The present invention finds application in a voice recognition system as a 
signal preprocessor system for decorrelating signals fi'om different sources such 

15 that a voice recognition processor can utiUze the various voice signals that are 
separated by the invention. In response to the voice signals, the voice 
recognition processor can then produce computer commands or computer text. 



20 BRIEF DESCRIPTION OF THE DRAWINGS 

The teachings of the present invention can be readily understood by 
considering the following detailed description in conjimction with the 
accompanying drawings, in which: 

FIG. 1 depicts a system for executing a software implementation of the 
25 present invention; 

FIG. 2 is a flow diagram of a method of the present invention; 
FIG. 3 depicts a firequency domain graph of the filter coefficients 
generated by the present invention; and 

FIG. 4 depicts a time domain graph of the filter coefficients generated by 
30 the present invention. 



4 



wo 99/5221 1 PCT/US99/07707 

To facilitate xinderstanding, identical reference numerals have been used, 
where possible, to designate identical elements that are common to the figures. 



DETAILED DESCRIPTION 

5 

The present invention estimates values of parameters W for the backward 
model of equation 2 by assviming non-stationaiy source signals and using a least 
squares (LS) optimization to estimate W as well as signal and noise powers. The 
invention transforms the source separation problem into the firequency domain 

10 and solves simultaneously a source separation problem for every firequency. 
FIG. 1 depicts a system 100 for implementing the source separation 
method of the present invention. The system 100 comprises a convolved signal 
source 126 that suppUes the signal that is to be separated into its component 
signals and a computer system 108 that executes the multiple decorrelation 

15 routine 124 of the present invention. The source 126 may contain any source of 
convolved signals, but is illustratively shown to contain a sensor array 102, a 
signal processor 104 and a recorded signal source 106. The sensor array 
contains one or more transducers 102A, 102B, 102C such as microphones. The 
transducers are coupled to a signal processor 104 that performs signal 

20 digitization, A digital signal is coupled to the computer system 108 for signal 
separation and further processing. A recorded signal source 106 may optionally 
form a source of the convolutive signals that require separation. 

The computer system 108 comprises a central processing imit (CPU) 114, 
a memory 122, support circuits 116, and an input/output (I/O) interface 120. 

25 The computer system 108 is generally coupled through the I/O interface 120 to 
a display 112 and various input devices 110 such as a mouse and keyboard. The 
support circuits generally contain well-known circuits such as cache, power 
suppUes, clock circiaits, a communications bus, and the like. The memory 122 
may include random access memory (HAM), read only memory (ROM), disk 

30 drive, tape drive, and the like, or some combination of memory devices. The 
invention is implemented as the multiple decorrelation routine 124 that is 
stored in memory 122 and executed by the CPU 114 to process the signal from 

5 
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the signal soxirce 126, As such, the computer system 108 is a general purpose 
computer system that becomes a specific purpose computer system when 
executing the routine 124 of the present invention. Although a general purpose 
computer system is illustratively shown as a platform for implementing the 
5 invention, those skilled in the art will reaUze that the invention can also be 
implemented in hardware as an appUcation specific integrated circuit (ASIC), a 
digital signal processing (DSP) integrated circxiit, or other hardware device or 
devices. As such, the invention may be implemented in sofl;ware, hardware, or a 
combination of software and hardware. 

10 The illustrative computer system 108 fiirther contains speech recognition 

processor 118, e.g., a speech recognition circuit card or a speech recognition 
6oft:ware, that is used to process the component signals that the invention 
extracts fi:'om the convolutive signal. As such, a conference room having a 
plurality of people speaking and background noise can be monitored with 

15 multiple microphones 102. The microphones 102 produce a composite speech 
signal that requires separation into component signals if a speech recognition 
system is to be used to convert each person's speech into computer text or into 
computer commands. The composite speech signal is filtered, amplified and 
digitized by the signal processor 104 and coupled to the computer system 108. 

20 The CPU 114, executing the multiple decorrelation routine 124, separates the 
composite signal into its conistituent signal components. From these constituent 
components, backgroimd noise can easily be removed. The constituent 
components without noise are then coupled to the speech recognition processor 
118 to process the component signals into computer text or computer 

25 commands. In this manner, the computer system 108 while executing the 
multiple decorrelation routine 124 is performing signal pre-processing or 
conditioning for the speech recognition processor 118. 

FIG. 2 depicts a flow diagram of the multiple decorrelation routine 124 of 
the present invention. At step 200, the convolutive (mixed) signal is input, the 

30 signal is parsed into a plurality of windows containing T-samples of the input 
signal X(/), and the routine produces a discrete Fourier transform (DFT) values 
for each window %(t), i.e., one DFT value for each window of length T samples. 

6 
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At step 202, the routine 124 xises the DFT values to accumulate K cross- 
correlation power spectra, where each of the K spectra are averaged over N 
windows of length T samples. 

For non-stationary signals, the cross correlation estimates will be 
5 dependent on the absolute time and will indeed vary from one estimation 
segment (an NT period) to the next. The cross correlation estimates computed 
in step 204 are represented as: 

R,(t,v)^-^X(^'^riT.v)x^(t-^nT,v) (5) 

where: 

;^(/+n7,v) = FF7 X(r + n7) 
^® X(0=Wf)-Jc(r + r-l)] 

and x(v) is the FFT of the input signal within a window containing T samples. 
As such, the routine, at step 204, computes a matrix for each time t and for each 
frequency v and then sums all the matrix components with each other matrix 
15 component. Steps 206, 208, 210 and 212 iterate the correlation estimation of 
step 204 over n=0 to N and k^O to K to produce the K spectra. 

Equation 5 can then be simpUfied to a matrix representation: 

R^(t,v) =A(v)A,(r,v)A''(v)+A„(r,v) (6) 
IfN is sufficiently large, A/*,i;) and AJit,v) can be modeled as diagonal 
20 matrices due to the signal independence assxmiption. For Equation 6 to be 

Unearly independent for different times, it will be necessary that A ^{t,v) changes 
over time, i.e., the signals are non-stationary. 

Using the cross correlation estimates of equation 6, the invention 
computes the source signals using cross-power-spectra satisfying the following 
25 equation: 

A, a V) = W( v)(J?,(f,v)- A„(f, v))W" (V) (7) 
In order to obtain independent conditions for every time period, the time 
periods are generally chosen to have non-overlapping estimation times for 
Rj(fky^)7 i.e-i = kTN. But if the signals vary sufficiently fast, overlapping 
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estimation times may be utilized. Furthermore, although the windows T are 
generally sequential, the windows may overlap one another such that each DFT 
value is derived from signal information that is also contained in the previous 
window. In an audio signal processing system, the specific value of T is selected 
5 based upon room acoustics of the room in which the signals are recorded. For 
example, a large room with many reverberation paths requires a long window T 
such that the invention can process a substantial amoimt of signal information 
to achieve source signal separation. The value of N is generally determined by 
the amoxmt of available data for processing. Typical values are N = 20, T = 

10 1024 samples and K = 5. 

The inventive method computes a multipath channel W (i.e., the tap 
values of a multidimensional FIR filter) that simultaneously satisfies equation 7 
for K estimation periods, e.g., 2 to 5 estimation periods for processing audio 
signals. Such a process is performed at steps 214, 216, 218 (collectively a filter 

15 parameter estimation process 224) and is represented using a least squares 
estimation procedure as follows: 

T K 2 

w,A,, A„ = arginin£XII^(^v)ll 

W(r)=O.T>fi 

where (8) 
£(^,v) = W(v)(^,(fc,v)-A„(fc,v))W''(v)-A,(/:,v) 

For simplicity, a short form nomenclature has been used in equation 8, where 
A,(Jk,v) = A,(/jt,v) and A, = A,(/„v),©, A,(/j^,v)and the same simplified notation 

20 also applies to AJ[t,v) and R/t,v). 

To produce the parameters W, a gradient descent process 224 (containing 
steps 214, 216, 218, and 220) is used that iterates the values of W as cost 
fimction (8) is minimized. In step 216, the W values are updated as W^'sW***- 
fiV^E, where V JE is the gradient step value and p. is a weighting constant that 

26 controls the size of the update. 

More specifically, the gradient descent process determines the gradient values 
as follows. 

8 
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3^ = j;,E(k.v)W{v)B"(k,v) (10) 

^(v) ni 

6E 



-^-diag(E(k,v)) (11) 



= -diag(W"(v)E(k,v)Wiv)) (12) 



B(fc,v)^^,(fc,v)-A„(fcv) (13) 
5 With equation 11 equal to zero, the routine can solve explicitly for 

parameters (k,v), while parameters A„ ik,v) andW(i;) are computed with a 
gradient descent rule, e.g., new values of A, (k,v) and Wiv) are computed with 
each pass through the routine xmtil the new values of W(v) are not very different 
from the old values ofW(v),le., W is converged. 

10 Note that equation 8 contains an additional constraint on the filter size in 

the time domain. Up to that constraint it would seem the various frequencies v 
= 1,...,7' represent independent problems. The solutions W(v) however are 
restricted to those filters that have no time response beyond x>Q «T. 
Effectively, the routine parameterizes Tdjd^ filter coefficients in W(v) with Qdjd^ 

15 parameters W(t). In practice, the values of W are produced in the frequency 
domain, at step 214, e.g., W(v), then, at step 218, an FFT is performed on these 
frequency domain values to convert the values of W(v) to the time domain, e.g., 
W(t), In the time domain, any W value that appears at a time greater than a 
time Q is set to zero and all values in the range below Q are not adjusted. The 

20 adjusted time domain values of are then converted using an inverse FFT back to 
the frequency domain. By zeroing the filter response in the time domain for all 
time greater than Q, the frequency response of the filter is smoothed such that a 
xmique solution at each frequency is readily determined. 

FIG. 3 depicts an illustration of two frequency responses 302A and 3026 

.25 and FIG. 4 depicts an illustration of their corresponding time-domain responses 
304A and 304B. The least squares solutions for the coefficients are foimd using 
a gradient descent process performed at step 224 such that an iterative 
approach is used to determine the correct values of W. Once the gradient in 
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equation 10 becomes "flat'' as identified in step 220, the routine, at step 222, 
applies the computed filter coefficients to an FIR filter. The FIR filter is used to 
filter the samples of the input (mixed) signal x(t) in the time period KNT in 
length. The FIR filter generates, at step 226, the decorrelated component 
5 signals of the mixed signal. Then, the routine at step 228 gets the next KNT 
number of samples for processing and proceeds to step 200 to filter the next 
KNT samples. The previous KNT samples are removed firom memory, 

As mentioned above, the gradient equations are constrained to remain in 
the subspace of permissible solutions with Wix) = 0 for x > Q « T. This is 
10 important since it is a necessary condition for equation 8 to achieve a good 
approximation. 

In practical appUcations, such as voice recognition signal preprocessing, 
the inventive routine substantially enhances the performance of the voice 
recognition accuracy, i.e., word error rates improve by 5 to 50% and in some 

15 instances approach the error rate that is achieved when no noise is present. 
Error rate improvement has been shown when a desired voice signal is 
combined with either music or another voice signal as background noise. 

Although various embodiments which incorporate the teachings of the 
present invention have been shown and described in detail herein, those skilled 

20 in the art can readily devise many other varied embodiments that still 
incorporate these teachings. 
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1. A method for separating a mixed signal into a pluraHty of component signals 
comprising the steps of: 

5 (a) producing a plxirality of discrete Fourier transform (DFT) values, 

where a DFT value is produced for every T samples of said mixed signal; 

(b) producing a cross correlation estimation matrix that is averaged over 
N of the DFT values; 

(c) repeating step (b) K times to produce a plurahty of cross correlation 
10 estimation values; 

(d) computing, using a gradient descent process, a plurality of filter 
coefficients for a finite impulse response (FIR) filter using the cross correlation 
estimation values; and 

(e) filtering the mixed signal vising the FIR filter having the computed 
15 filter coefficients to separate the mixed signal into the plurality of component 

signals. 

2. The method of claim 1 wherein step (d) further comprising the steps of: 

(dl) transforming the filter coefficients fi:'om the firequency domain into 
20 the time domain; 

(d2) zeroing any filter coefficients having a value other than zero for any 
time that is greater than a predefined time Q to produce adjusted time domain 
filter coefficients, where Q is less than T; and 

(d3) transforming the adjxisted time domain filter coefficients firom the 
25 time domain into the firequency domain to produce the filter coefficients vised in 
the gradient descent process. 

3. The method of claim 1 wherein the cross correlation estimation values are 
produced in step (b) using the following equation: 

30 RAt^v) = -^xit+nT,v)x^(t^nT,v) 

where x(^*^) is the mixed signal in the fi^equency domain. 

11 
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4. The method of claim 3 wherein the gradient descent process minimizes the 
function: 



where E(k,v) is an error function. 

5 5. Apparatus for separating a mixed signal into a pluraHty of component signals 
comprising: 

means for producing a plurahty of discrete Fovirier transform (DFT) 
values, where a DFT value is produced for every T samples of said mixed signal; 

means for producing a plurahty of cross correlation estimation values 
10 using the DFT values; 

a gradient descent processor for computing a plurahty of filter coefficients 
for a finite impulse response (FIR) filter using the cross correlation estimation 
values; and 

a filter for filtering the mixed signal using the FIR filter having the 
15 computed filter coefficients to separate the mixed signal into the plurahty of 
component signals. 

6. The apparatus of claim 5 wherein the gradient descent processor further 
comprises: 

20 a first transformer for transforming the filter coefficients firom the 

frequency domain into the time domain; 

means for zeroing any filter coefficients having a value other than zero for 

any time that is greater than a predefined time Q to produce adjusted time 

domain filter coefficients, where Q is less than T; and 
25 a second transformer for transforming the adjusted time domain filter 

coefficients fi*om the time domain into the frequency domain to produce the 

filter coefficients used in the gradient descent process. 
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7. The apparatus of claim 5 wherein the cross correlation estimation values are 
produced using the following equation: 

12 
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where x(t,v) is the mixed signal in the frequency domain. 

8, The apparatus of claim 5 further comprising a voice recognition system for 
5 processing at least one of the plurality of component signals. 

9. The apparatus of claim 5 further comprising a plurality of microphones that 
produce the mixed signal. 

10 10. The apparatus of claim 5 wherein the gradient descent processor minimizes 
the function: 

iiii£(ik.v)!f ^ 

where E(k,v) is an error function. 

11. A computer readable storage medimn containing a program that, when 
executed upon a general pvirpose computer system causes the general purpose 
computer system to become a specific purpose computer system that performs a 
method for separating a mixed signal into a plurality of component signals 
comprising the steps of: 

(a) producing a pluraHty of discrete Fourier transform (DFT) values, 
where a DFT value is produced for every T samples of said mixed signal; 

(b) producing a cross correlation estimation matrix that is averaged over 
N of the DFT values; 

(c) repeating step (b) K times to produce a plurality of cross correlation 
estimation values; 

(d) computing, using a gradient descent process, a plurality of filter 
coefficients for a finite impulse response (FIR) filter using the cross correlation 
estimation values; and 
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(e) filtering the mixed signal using the FIR filter having the computed 
filter coefficients to separate the mixed signal into the pluraUty of component 
signals. 

5 12. The computer readable medium of claim 11 wherein step (d) of the method 
fiirther comprises the steps of: 

(dl) transforming the filter coefficients from the fi:-equency domain into 
the time domain; 

(d2) zeroing any filter coefficients having a value other than zero for any 
10 time that is greater than a predefined time Q to produce adjusted time domain 
filter coefficients, where Q is less than T; and 

(d3) transforming the adjusted time domain filter coefficients fi*om the 
time domain into the fi'equency domain to produce the filter coefficients used in 
the gradient descent process. 

15 

13. The computer readable medixmi of claim 11 wherein the cross correlation 
estimation values are produced in step (b) using the following equation: 

where x(^»^) is the mixed signal in the firequency domain 

20 

14. The computer readable medium of claim 11 wherein the gradient descent 
process minimizes the fimction: 

iiii£(fc.v)!f 

where E(k,v) is an error fimction. 
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